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ABSTRACT 


Keywords: Bats are the natural reservoirs of severe acute respiratory syndrome coronavirus (SARS-CoV) which caused the 

Bats outbreak of human SARS in 2002-2003. We introduce the genetic diversity of SARS-related coronaviruses 

SARS (SARSr-CoVs) discovered in bats and provide insights on the bat origin of human SARS. We also analyze the viral 

Geographical structure geographical structure that may improve our understanding of the evolution of bat SARSr-CoVs. 

Phylogeny 
Host switch 


1. Introduction 

Coronaviruses (CoVs) are enveloped positive-sense, single-stranded 
RNA viruses belonging to the subfamily Coronavuinae, family 
Coronaviridae, in the order Nidovirales, which are further divided into 
four genera. Alpha-, Beta-, Gamma- and Deltacoronavirus (de Groot et al., 
2011; Payne, 2017). CoVs are the pathogenic agents for both avian and 
mammals, and have a worldwide distribution, usually causing re¬ 
spiratory diseases when infecting humans. In 2002-2003, a novel cor¬ 
onavirus termed Severe Acute Respiratory Syndrome (SARS) cor¬ 
onavirus caused > 8000 cases of infection with a mortality of 
approximately 10%, drawing the attention for CoVs of zoonotic origin 
(Ksiazek et al., 2003; Peiris et al., 2004). Subsequently, more CoVs were 
identified from humans and different animals, containing human cor¬ 
onavirus NL63 (HCoV-NL63), HCoV-HKUl, Middle East respiratory 
syndrome coronavirus (MERS-CoV), swine acute diarrhoea syndrome 
coronavirus (SADS-CoV), bat-CoV HKU4, bat-CoV HKU5, white-eye 
coronavirus HKU16 (WECoV HKU16), sparrow coronavirus HKU17 
(SpCoV HKU17), magpie robin coronavirus (MRCoV HKU18) and so on 
(Raj et al., 2014; Su et al., 2016; Woo et al., 2012; Woo et al., 2007; 
Zhou et al., 2018), indicating that CoVs have greater diversity and host 
range than estimated and remain a potential risk for the public health. 
Frequent contacts with humans and animals carrying coronaviruses 
provide a greater chance to facilitate cross-species viral transmission 
and emerge new viral variants. 


1.1. The emergence and tracing the animal origin of SARS 

In late 2002, SARS first emerged in Guangdong Province in southern 
China, and rapidly spread to other provinces and other countries, re¬ 
sulting in a global pandemic of severe respiratory diseases (Zhong et al., 
2003). Initial investigations and researches indicated that marketplace 
masked palm civets ( Paguma larvata ) were likely to be the animal origin 
for SARS coronavirus (SARS-CoV) (Guan et al., 2003; Kan et al., 2005; 
Song et al., 2005), but no SARS-CoV was detected in farmed or wild- 
caught civets in the subsequent epidemiological studies, revealing that 
civets probably served only as intermediate hosts for SARS-CoV trans¬ 
mission (Chan and Chan, 2013; Shi and Hu, 2008; Tu et al., 2004). 

1.2. Discovery of bat SARS-related coronaviruses (SARSr-CoVs) 

In 2005, the discovery of novel CoVs related to human SARS-CoVs 
in Chinese horseshoe bats (genus Rhinolophus), named SARS-related 
coronaviruses (SARSr-CoVs), provided new clue that bats may be the 
natural host for SARS-CoV (Lau et al., 2005; Li et al., 2005). Since then, 
genetically diverse SARSr-CoVs have been discovered in Asia, Europe, 
and Africa, including China, South Korea, Thailand, Bulgaria, Slovenia, 
Italy, Luxembourg, Nigeria, and Kenya (Balboni et al., 2012b; Drexler 
et al., 2010; He et al., 2014; Lau et al., 2010; Lau et al., 2005; Li et al., 
2005; Pauly et al., 2017; Ren et al., 2006; Rihtaric et al., 2010; Yang 
et al., 2013; Yuan et al., 2010). Importantly, it was reported that some 
bat SARSr-CoVs were able to use angiotensin converting enzyme II 
(ACE2) from humans, civets and Chinese horseshoe bats as a receptor 
for cell entry (Ge et al., 2013), further supporting human SARS-CoV 
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originated from Chinese horseshoe bats and suggesting that these 
SARSr-CoVs had the ability to infect humans immediately without other 
intermediate hosts. Furthermore, serological evidence by ELISA of in¬ 
fection of bat SARSr-CoVs in human who live close to the bat cave in 
Yunnan, China, where diverse SARSr-CoVs were detected in bats, sug¬ 
gested the potential spillover of SARSr-CoVs from bats to humans 
(Wang et al., 2018). 

1.3. Genetic diversity of SARSr-CoVs 

SARS-CoV and SARSr-CoVs belong to lineage B of genus 
Betacoronavirus in the family Coronaviridae and share the same genomic 
organization with other coronaviruses, including genes coding for 16 
nonstructural proteins (nsp, in ORFlab domain), the structural proteins 
like spike protein (S), envelope (E), membrane (M), nucleocapsid (N) 
and other several genes (Perlman and Netland, 2009; Woo et al., 2009). 
The major distinction between SARS-CoV and SARSr-CoV genomes lies 
in the non-structural protein 3 (nsp3), ORF3, S and ORF8, among which 
S gene and ORF8 are the most variable (Shi and Hu, 2008; Wu et al., 
2016). The S gene coding for spike protein can be further divided into 
two subunits SI and S2, responsible for receptor binding and cellular 
membrane fusion, respectively (Belouzard et al., 2009). The SI subunit 
is composed of the N-terminal domain (NTD) and the receptor-binding 
domain (RBD), the latter of which is critical for host-receptor binding 
and plays an important role on determining host range (Becker et al., 
2008; de Haan et al., 2006; Li, 2013; Schickli et al., 2004; Tusell et al., 
2007). Compared with human/civet SARS-CoV, most known SARSr- 
CoVs had two deletions in the RBD domain such as Rp3 (DQ071615), 
while a Bulgarian strain BM48-31 (GU190215) from Rhinolophus blasii 
had only one deletion in that region (Drexler et al., 2010; He et al., 
2014). Several strains like WIV1 (KF367457) had same sequences 
length with SARS-CoV in the RBD regions, which were authenticated to 
be able to use human ACE2 as a cellular entry receptor (Ge et al., 2013; 
Hu et al., 2017; Yang et al., 2015). However, these SARSr-CoVs without 
any deletions have so far been merely discovered in Yunnan, indicating 
that the origin of the S genes of the immediate ancestors of SARS-CoV 
had been restricted in Yunnan. The ORF8 was highly variable during 
the course of the SARS epidemic in China (CSME, 2004). Most bat 
SARSr-CoVs (except the strain HKU3-8, Rs4084 and African and Eur¬ 
opean bat SARSr-CoVs) and the early human SARS-CoV contain a single 
ORF8 (Balboni et al., 2012a). The HKU3-8 (GQ153543) has a 26 nt 
deletion in the ORF8 gene which subdivides its ORF8 into ORF8a, b, c. 
The ORF8 of Rs4084 is split into 8a and 8b due to a 5 nt deletion in its 
ORF8, similar to the ORF8a/8b of the middle/late human SARS-CoVs 
with a 29-nt deletion in the ORF8. In the European strain BM48-31, the 
ORF8 was entirely absent (Drexler et al., 2010; Hu et al., 2017; Lau 
et al., 2010). Moreover, compared with other bat SARSr-CoVs, some 
viruses such as WIV1 and WIV16 had an additional ORF (named ORFx) 
in their gene organization, involved in modulation of the host immune 
response (Hu et al., 2017; Yang et al., 2015; Zeng et al., 2016). 

2. Geographical restriction of SARSr-CoVs 

2.1. The phylogeny of SARS-CoV and SARSr-CoVs 

SARSr-CoVs have been detected in bats from a wide range of pro¬ 
vinces in China, including Guangdong, Guangxi, Guizhou, Hebei, 
Henan, Hong Kong, Hubei, Jilin, Shaanxi, Shanxi, Taiwan and Zhejiang 
(Table 1). Except several from Hipposideridae, these viruses were mainly 
detected in bats from the family Rhinolophidae, indicating that they are 
likely to be natural hosts for SARSr-CoVs. 

We collected the full-length RNA-dependent RNA polymerase 
(RdRp) sequences of previously reported SARSr-CoVs and SARS-CoVs 
retrieved from GenBank (Table SI). We used the Xia' test. Phi test/RDP 
and likelihood mapping analysis to check the Saturation Index, re¬ 
combination and phylogenetic signal of our data, respectively before 


Table 1 

Host information and distribution of SARSr-CoVs available in GenBank. 


Provinces 

Bat species 

Guangdong 

Rhinolophus sinicus 

Guangxi 

Rhinolophus pearsonii, Rhinolophus sinicus 

Guizhou 

Rhinolophus rex, Rhinolophus sinicus 

Hebei 

Rhinolophus ferrumequinum 

Henan 

Rhinolophus ferrumequinum 

Hong Kong 

Rhinolophus sinicus 

Hubei 

Rhinolophus ferrumequinum, Rhinolophus macrotis, Rhinolophus 
sinicus 

Jilin 

Rhinolophus ferrumequinum 

Shaanxi 

Rhinolophus pusillus 

Shanxi 

Rhinolophus ferrumequinum 

Taiwan 

Rhinolophus monoceros 

Yunnan 

Aselliscus stoliczkanus, Rhinolophus affinis, Rhinolophus 
ferrumequinum, Rhinolophus sinicus 

Zhejiang 

Rhinolophus monoceros, Rhinolophus pearsonii, Rhinolophus sinicus, 
Rhinolophus thomasi 


performing the phylogenetic reconstruction (Huson and Bryant, 2006; 
Martin et al., 2015; Strimmer and von Haeseler, 1997; Xia, 2013). 
Subsequently, we constructed a phylogenetic tree using these nucleo¬ 
tide sequences of full-length RdRp gene with the maximum likelihood 
(ML) method under the GTR + I + T model of nucleotide substitution 
as implemented in PhyML (version 3.1) (Guindon et al., 2010). Optimal 
model of nucleotide substitution were determined using Akaike In¬ 
formation Criterion (AIC) available in jModelTest (version 2.1.10) 
(Darriba et al., 2012). Three main lineages were found from that phy¬ 
logenetic tree when HKU4-1 (EF065505) was set as a outgroup 
(Fig. 1A). The lineage 1, composed of bat SARSr-CoVs from the 
southwestern provinces including Yunnan, Guizhou and Guangxi with 
human/civet SARS-CoV. The viruses from other southern regions con¬ 
taining Guangdong, Hong Kong, Hubei and Zhejiang made up the 
second lineage (lineage 2). The third lineage (lineage 3) consisted of the 
strains from the central and northern areas such as Hubei, Henan, 
Shanxi, Shaanxi, Hebei and Jilin. Although SARS first emerged in 
Guangdong province, the lineage 1 SARSr-CoVs from southwestern 
China were closer to human SARS-CoV than other provinces in China 
including Guangdong, indicating Guangdong is unlikely to be the 
geographical origin of SARS-CoV and the direct progenitor of human 
SARS-CoV may have originated from lineage 1 (Hu et al., 2017). Ad¬ 
ditionally, the SARSr-CoVs from adjacent provinces grouped together 
(Fig. IB), revealing that similar viruses have circulated in the neigh¬ 
boring provinces. In addition, it is also suggested that the bat hosts of 
SARSr-CoVs from southern China were more diversified than those 
from other locations. 

2.2. Relationships between bat coronaviruses and their hosts 

Coronaviruses are single-stranded RNA viruses easy to mutate, 
which increases the diversity of the species and give them the ability to 
rapidly adapt to new hosts (Longdon et al., 2014). Nevertheless, the 
evolution and development of CoVs were not only the consequence of 
the coronavirus phylogeny and biology, but also the results of the in¬ 
teraction between CoVs and their hosts (Cui et al., 2007; Graham and 
Baric, 2010; Longdon et al., 2014; Parrish et al., 2008). Bats are the 
only mammals naturally capable of true and sustained flight. The bat 
tagging exercise had shown that the longest distance of the migration of 
the Chinese horseshoe bats is 17 km and other Rhinolophus species may 
migrate up to 30 km for hibernation (Lau et al., 2010). Such migration 
distance would help the transmission of SARSr-CoVs carried by bats 
within a certain geographical range. In order to identify the relation¬ 
ships between bat CoVs and their hosts, a tanglegram was made con¬ 
necting the RdRp phylogeny of the SARSr-CoVs and the cytochrome b 
(CytB) phylogeny of their hosts (Fig. 2; Table S2). Different bat species 
in the same location like Yunnan, Guizhou and Zhejiang harbor closely 
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Fig. 1. Phylogenetic analysis of SARS-CoVs and bat SARSr-CoVs. (A) The phylogenetic tree was constructed using the complete RdRp coding sequences and viewed in 
iTOL (http://itol.embl.de/). All strains here were named using abbreviations of virus ID and sampling provinces. The strain HKU4-1 (NC_009019) was used as a 
outgroup of that tree. The taxa for lineage 1, lineage 2 and lineage 3 are highlighted in light red, light green and light blue, respectively. The lineage 1, lineage 2 and 
lineage 3 are displayed with colored pentagrams. The taxa for the only European strain European BM48-31 (GUI 90215) is displayed by light purple. The branch of 
SARS-CoV is marked in red. These strains from Zhejiang were collapsed into a triangle named ZJ-SL/Zhejiang. The viruses from Hong Kong also were collapsed into a 
triangle named HKU3/Hong Kong. The numbers adjacent to the node represents the bootstrap value of 1000 replicates and only bootstrap values >70% are shown. 
(B) The geographical distribution of bat SARSr-CoVs and their hosts from the lineage 1, lineage 2 and lineage 3 in the A diagram. The South Korea and Chinese 
provinces from the three lineages are marked in the map using the corresponding colored pentagrams. The bat hosts are also mapped to the corresponding regions in 
the map. Location abbreviations are as follows: YN, Yunnan; GZ, Guizhou; GX, Guangxi; GD, Guangdong; HK, Hong Kong; SAX, Shaanxi; HuB, Hubei; SX, Shanxi; 
HeN, Henan; HeB, Hebei; ZJ, Zhejiang; JL, Jilin; SK, South Korea. (For interpretation of the references to colour in this figure legend, the reader is referred to the web 
version of this article.) 


related SARSr-CoVs, suggesting the lack of a strict host restriction and 
the existence of host shift in bat SARSr-CoVs (Cui et al., 2007). In ad¬ 
dition, host shift mostly happened in different species under the same 
genus Rhinolophus, indicating that genetic distance between hosts as a 
key factor determines both the host shifts and cross-species transmis¬ 
sion. Besides, though from same bat species, the SARSr-CoVs from ad¬ 
jacent provinces clustered, further supporting that the evolution of 
SARSr-CoVs were restricted by geography rather than by bat species. 

2.3. Recombinant origin of SARS-CoV from bat SARSr-CoVs 

Recombination plays a significant role in the evolution of virus, 
which may create emerging virus, expand their host range (Graham and 
Baric, 2010; Vennema et al., 1998). Recombination events have been 
discovered in SARS-CoV and bat SARSr-CoVs (Graham and Baric, 2010; 
Hon et al., 2008). The two major recombination hotspots between bat 
SARSr-CoVs and SARS-CoV are S gene and ORF8, which probably 
contributes to the variability of the two genes (Hon et al., 2008; Lau 
et al., 2015; Wu et al., 2016). All the genomic constituents of SARS-CoV 
including the hypervariable regions S and ORF8 were discovered from 
different bat SARSr-CoVs in the same cave in Yunnan, with evidence of 
recombination events detected between these bat SARSr-CoVs (Hu 
et al., 2017), suggesting that human SARS-CoV may originate from the 
recombinant of bat SARSr-CoVs in this region. The SARSr-CoVs without 
any deletion at the RBD domain were only identified in Yunnan, so the 
S genes of human SARS-CoVs were from the recombination of these 
viruses in Yunnan. As recombination occurs frequently among bat 
SARSr-CoVs, further genomic characterization of bat SARS-CoVs in a 
broader range of host species and geographical origin needs to be done 


to understand the role of recombination plays in the evolution of 
SARSr-CoVs. 

3. Conclusions 

As bats have been identified to be the natural reservoirs of various 
emerging viruses, the concept of zoonotic origin of important viral 
pathogens becomes widely accepted (Parrish et al., 2008).Deciphering 
the evolution of a viral pathogen is vital for us to understand the con¬ 
text of its emergence. Although SARS were controlled and vanished in 
2004, those recently identified SARSr-CoVs which are able to use 
human ACE2 receptor have posed a potential risk of future emergence 
(Ge et al., 2013; Graham and Baric, 2010; Parrish et al., 2008). In 
particular, the serological evidence of bat SARSr-CoV infected in human 
was discovered in Yunnan, suggesting these viruses may have spilled 
over to human from bats directly or via other intermediate hosts in 
Yunnan. 

Up to present, bat SARSr-CoVs have been discovered in Asia, Europe 
and Africa (Balboni et al., 2012b; Drexler et al., 2010; He et al., 2014; 
Lau et al., 2010, 2005; Li et al., 2005; Ren et al., 2006; Rihtaric et al., 
2010; Yang et al., 2013; Yuan et al., 2010). However, for most of these 
strains from countries other than China, only partial RdRp fragment 
were obtained and full-length genome sequences have been determined 
for only few of them, thus the available genetic information is in¬ 
sufficient to explore the evolution and spread of these SARSr-CoVs. 
Phylogeny using these short sequences of currently known SARSr-CoVs 
indicated that the bat SARSr-CoVs from China are closer to human 
SARS-CoV than those from other countries (Ar Gouilh et al., 2018; 
Drexler et al., 2010; Quan et al., 2010; Rihtaric et al., 2010), suggesting 



Fig. 2. Phylogenetic relationships between human/civet SARS-CoVs, bat SARSr-CoVs and their hosts. The left tree was inferred based on the nucleotide of the 
complete RdRp sequences of human/civet SARS-CoVs and bat SARSr-CoVs. The right tree was constructed using the nucleotide of the CytB sequences of hosts. The 
branch of SARS-CoV from the left tree is marked in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of 
this article.) 
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that human SARS-CoV may have originated from China. 

Our analysis revealed that the human SARS-CoV may have origi¬ 
nated from south China including Yunnan, Guangxi and Guizhou, and 
similar viruses likely circulated in these provinces for an extended time 
period before eventually emerging in humans. In addition, SARSr-CoVs 
clustered according to their geographical location of sampling, in¬ 
dicating that geographical range overlap between hosts is likely to play 
an important role in shaping the evolution of these viruses (Faria et al., 
2013). Co-phylogeny analysis indicated the lack of a host restriction 
and the existence of frequent host shift in bat SARSr-CoVs, mainly oc¬ 
curred in horseshoe bats (genus Rhinolophus ), which may be due to that 
close relatives of the hosts offer a similar environment for the virus to 
adapt (Longdon et al., 2014). However, space presents a greater barrier 
to virus diversification than host species for the evolution of bat SARSr- 
CoVs. Most importantly, cross-species transmission and frequent re¬ 
combination of SARSr-CoVs within horseshoe bat populations in 
Yunnan could eventually lead to the generation of human SARS-CoV 
(Graham and Baric, 2010; Hon et al., 2008; Hu et al., 2017). Although 
Rhinolophus species may migrate up to 30 km (Lau et al., 2010), it is 
very unlikely for them to migrate a long distance such as from Yunnan 
to Guangdong. 

There are still some gaps needed to be filled in the origin of human 
SARS-CoV. Given that human SARS-CoV originated from bats in 
southwestern China including Yunnan, Guangxi and Guizhou, their 
transmission and migration to Guangdong where human SARS first 
appeared are unclear and needed to be clarified in the future. Although 
the serological evidence of bat SARSr-CoV infection was discovered in 
human living in proximity to the cave where diverse SARSr-CoVs are 
circulating (Wang et al., 2018), it is unable to judge that the SARSr- 
CoVs infecting those human populations are from bats or other animals 
inhabiting with bats. In short, it is necessary to carry out continuous 
surveillance of SARSr-CoVs in different geographical locations targeting 
different bat species and surrounding animals. 
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